Distributed Value Functions
نویسندگان
چکیده
Many interesting problems, such as power grids, network switches, and tra c ow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximize the sum of the rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of future rewards for all the nodes in the network. With this representation, each node can choose actions to improve the performance of the overall system. We demonstrate our algorithm on the distributed control of a simulated power grid. We compare it against other methods including: use of a global reward signal, nodes that act locally with no communication, and nodes that share rewards (but not value function) information with each other. Our results show that the distributed value function algorithm outperforms the others, and we conclude with an analysis of what problems are best suited for distributed value functions and the new research directions opened up by this work.
منابع مشابه
Boundary Value Problems in Generalized Thermodiffusive Elastic Medium
In the present study, the boundary value problems in generalized thermodiffusive elastic medium has been investigated as a result of inclined load. The inclined load is assumed to be a linear combination of normal load and tangential load. Laplace transform with respect to time variable and Fourier transform with respect to space variable are applied to solve the problem. As an application of t...
متن کاملOptimal Location and Sizing of Distributed Generations in Distribution Networks Considering Load Growth using Modified Multi-objective Teaching Learning Based Optimization Algorithm
Abstract: This paper presents a modified method based on teaching learning based optimization algorithm to solve the problem of the single- and multi-objective optimal location of distributed generation units to cope up the load growth in the distribution network .Minimizing losses, voltage deviation, energy cost and improved voltage stability are the objective functions in this problem. Load g...
متن کاملLeast squares weighted residual method for finding the elastic stress fields in rectangular plates under uniaxial parabolically distributed edge loads
In this work, the least squares weighted residual method is used to solve the two-dimensional (2D) elasticity problem of a rectangular plate of in-plane dimensions 2a 2b subjected to parabolic edge tensile loads applied at the two edges x = a. The problem is expressed using Beltrami–Michell stress formulation. Airy’s stress function method is applied to the stress compatibility equation, and th...
متن کاملExperimental and Numerical Flow Investigation of Intake Manifold and Multi Criteria Decision Making on 3-cylinder SI Engine using Technique for Order of Preference by Similarity to Ideal Solution (RESEARCH NOTE)
In this paper, technique for order of preference by similarity to ideal solution(TOPSIS) method is used to find the best compromising design of intake manifold for a 3-cylinder engine considering mean value of torque, torque at 3500 rpm, mean value of brake mean specific consumption (BSFC) and BSFC at 3500 rpmas four objective functions. To calculate the objective functions, engine simulation i...
متن کاملAdaptive and Non-adaptive Distribution Functions for DSA
Distributed hill-climbing algorithms are a powerful, practical technique for solving large Distributed Constraint Satisfaction Problems (DSCPs) such as distributed scheduling, resource allocation, and distributed optimization. Although incomplete, an ideal hill-climbing algorithm finds a solution that is very close to optimal while also minimizing the cost (i.e. the required bandwidth, processi...
متن کاملAnalysis of Indo - Iranian Trade
This paper focuses on Indo-Iranian merchandise trade for a period of 30 years from1980-81 to 2009-10. The study is based on econometric modeling comprising 3 versions of the RWM model and alternative forms and specification of regression functions. RWM model(s) evaluates the stationary nature of time series data relating to GNP, Exports, Imports and total merchandise Indo-Iranian trade. Dickey-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999